Code for Sacramento: Equity Asset Map Project
Change Log
This project focuses on the asset map item for the Code for America (CFA) National Day of Civic Hacking (NDocH) event; it consists of this notebook to visualize various assets within the Sacramento area. The data is sourced from various open data portals and provides the deliverables listed below.
Deliverables
Note: Data processing and maps are created using Jupyter Notebook due to its ability to visualize results effectively and efficiently. Listed below are installation instructions and more about Jupyter and Python.
The event NDoCH instructions are shown below and specify that open data sources should be used to visualize resources available to the Sacramento community. This notebook is intended as starting point to visualize such data for further development.
Asset mapping is an integral part of empowered community building that is based on understanding the strengths and needs of diverse communities. First, use publicly available information about your locale to give a sense of the landscape and demographics. Next, research the location and availability of government programs (e.g. county health and human services offices), community based organizations (like resource centers, food banks, and legal aid clinics) or other resources that are vital to your community. Visually documenting the landscape can help identify what might make your community more equitable and accessible to all who live there.
This notebook starts with a tutorial using Python mapping tools as a prototype, then develops asset maps for the Sacramento area. Open data sources are listed below and will be added to with additional development.
One desired outcome for this project is to develop a better understanding of publicly available data, appropriate tools and spatial analysis technique in develop the asset maps. As a result, project methodology, assumptions and results are documented in this readme.
Key project assumptions are listed below and validated through analysis and visualization of publicly available datasets. They serve as an outline for guide development of the notebook, analysis and visualizations.
Assumptions
Asset maps are summarized below and organized into separate modules within the notebook. Each map is available as a separate HTML file in the maps folder.
Data Processing Steps
Results
SF Open Data
Tutorials
Results
Note: Analysis only evaluates proximity and not school quality; also, additional CA Geoportal data are listed below for future analysis.
SACOG Data
CA Geoportal Data
Results
SACOG Data
City of Sacramento Data
Results
SACOG Data
Results
SACOG Data
This notebook will require some basic understanding of the Python programming language, Jupyter platform and data analysis concepts. It is based on this tutorial and Github Repo.
Jupyter is a powerful collaborative tool which is open-source and light-weight. It provides all the tools necessary to run data analysis, visualization, statistics and data science out of the box. In addition, it has gain acceptance from industry and academia for collaborating on projects and publishing work.
Jupyter is a combination of text and code with the programming run-time built into the platform so there is no need to install additional software. The text is in the markdown file format (similar to HTML), and code in several languages. It is organized by cells which can consist of either text or code; placed together, they can be sent as a single document to share/publish work.
Notebooks are organized by cells, which mainly consist of text (in markdown) and code (Python). It operations like a hybrid between MS Word and Excel file; whereas the entire file is like a document, the cells operate like a spreadsheet. For getting started, feel free to scroll down each cell and navigate around the cells for a quick tour. Here is a breakdown of how to view/edit cells:
Navigation
Notes
Jupyter
Python
Markdown
# 01 - load modules into notebook
# install pip package in current kernel; run only for initial install:
# https://medium.com/@rohanguptha.bompally/python-data-visualization-using-folium-and-geopandas-981857948f02
# !pip install descartes
# https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
# import sys
# !{sys.executable} -m pip install --upgrade pip
# numerical data modules
import numpy as np
import scipy
# data analysis module
import pandas as pd
# data visualization module
# import matplotlib.pyplot as plt
# adjust plot settings
# %matplotlib inline
# data visualization module
# https://seaborn.pydata.org/
# import seaborn as sns; sns.set(color_codes=True)
# geospatial data modules
import folium
from folium.plugins import MarkerCluster
import os
import json
# geospatial data modules
# from shapely.geometry import Point, Polygon
# from shapely.geometry import shape, LineString, Point
# import geojsonio
# from descartes import PolygonPatch
import geopandas as gpd
import fiona
# 02.00 - data functions
# function to read csv file
# https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url/41880513#41880513
def read_data(path):
df = pd.read_csv(path)
return(df)
# function to output csv file
def output_result(df, filepath):
df.to_csv(filepath)
# function to show table info
def data_profile(df, msg):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Table Info: %s ***' % msg, '\n')
print(df.info(), '\n')
print('*** Table Info: Table Dimensions ***', '\n')
print(df.shape, '\n')
# function to show unique value for given column
def show_unique(df, col):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Unique Values: (%s) ***' % col, '\n')
print(df[col].unique(), '\n')
# function to output summary stats
def summary_stats(df, col):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Summary Stats: (%s) ***' % col, '\n')
print(df[col].describe(), '\n')
# print(col.describe())
# function to rename columns
# https://www.geeksforgeeks.org/how-to-rename-columns-in-pandas-dataframe/
def rename_col(df, old_col, new_col):
df.rename(
columns={old_col:new_col},
inplace=True
)
return df
# function convert col to numeric type
# reference: https://stackoverflow.com/questions/47333227/pandas-valueerror-cannot-convert-float-nan-to-integer
def convert_num(df, col):
# convert type
df[col] = pd.to_numeric(
df[col],
errors='coerce'
)
return(df)
# convert string to datetime
# reference: https://stackoverflow.com/questions/32888124/pandas-out-of-bounds-nanosecond-timestamp-after-offset-rollforward-plus-adding-a
def convert_date(df, col):
# convert type
df[col] = pd.to_datetime(
df[col],
infer_datetime_format=True,
errors = 'coerce'
)
return(df)
# function convert col to string type
def convert_str(df, col):
# convert type
df[col].astype(str)
return(df)
# 02.01 - data import
# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
# df_data = read_data("data/sfpd_report_2003-18.csv")
# note: reduce original file (500mb) by subset first 10k rows and replace file
# https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html
# df_data = df_data[0:10000]
# output_result(df_data, "data/sfpd_report_2003-18.csv")
# read in reduced file after processing steps above
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
df_sfpd = read_data("data/sfpd_report_2003-18.csv")
# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
df_school = read_data("data/ca_school_2019-20.csv")
# sacog - lihm community dataset (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0
sacog_lihm_csv = read_data("data/sacog_lihm_areas_2016.csv")
# 02.02 - data processing
# subset dataset by row values; for example, schools by count
# https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values
df_school_sac = df_school[
df_school['CountyName'].str.contains('Sacramento')
]
df_school_amador = df_school[
df_school['CountyName'].str.contains('Amador')
]
df_school_placer = df_school[
df_school['CountyName'].str.contains('Placer')
]
df_school_yolo = df_school[
df_school['CountyName'].str.contains('Yolo')
]
df_school_yuba = df_school[
df_school['CountyName'].str.contains('Yuba')
]
# todo: process geojson
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson
# df_school_geojson = read_data("data/ca_school_2019-20.geojson")
# data profile data after import
# data_profile(df_sfpd, 'SFPD Reports (2003-18)')
# data_profile(df_school, 'CA Schools (2019-20)')
# data_profile(df_school_sac, 'CA Schools: Sacramento County (2019-20)')
# data_profile(df_school_amador, 'CA Schools: Amador County (2019-20)')
# data_profile(df_school_placer, 'CA Schools: Placer County (2019-20)')
# data_profile(df_school_yolo, 'CA Schools: Yolo County (2019-20)')
# data_profile(df_school_yuba, 'CA Schools: Yuba County (2019-20)')
# 03.00 - map functions
# tutorial - folium plot with cluster markers
# https://python-visualization.github.io/folium/quickstart.html
# https://www.jpytr.com/post/analysinggeographicdatawithfolium/
# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb
# function to plot coordinates with cluster markers
def plot_cluster(col1, col2, icon_color, cluster_name, map):
# zip lat/long into list
location = list(zip(col1, col2))
# icon = [folium.Icon(color='red') for _ in range(len(location_sac))]
icon = [folium.Icon(color=icon_color) for _ in range(len(location))]
# plot clusters
cluster = MarkerCluster(
# name='CA Schools: Sac County, 2019-20 (Red)',
name=cluster_name,
control=True,
locations=location,
icons=icon
)
map.add_child(cluster)
return(map)
# function to create choropleth plot
# usage: input json and csv data; outputs map object, then pass to plot func
def plot_choropleth(data_json, data_csv, col_array, col_key, fill, name, map):
# choropleth plot with settings
choropleth = folium.Choropleth(
geo_data=data_json,
name='choropleth',
data=data_csv,
# columns=['OBJECTID', 'Minority'],
columns=col_array,
# key_on='feature.properties.OBJECTID',
key_on=col_key,
fill_color=fill,
fill_opacity=0.7,
line_opacity=0.2,
legend_name=name,
highlight=True,
line_color='black'
).add_to(map)
# add hover-over tooltip
choropleth.geojson.add_child(
folium.features.GeoJsonTooltip(['OBJECTID'],labels=False)
)
return(map)
# function to add map controls and title
# https://stackoverflow.com/questions/37466683/create-a-legend-on-a-folium-map
# https://stackoverflow.com/questions/61928013/adding-a-title-or-text-to-a-folium-map
def plot_map(loc_title, file_path, map):
# add legend and layer control
map.add_child(folium.map.LayerControl())
# add map title
loc = loc_title
title_html = '''
<h3 align="center" style="font-size:16px"><b>{}</b></h3>
'''.format(loc)
map.get_root().html.add_child(folium.Element(title_html))
# display and save map
display(map)
map.save(file_path)
# function to plot geojson
# https://medium.com/@rohanguptha.bompally/python-data-visualization-using-folium-and-geopandas-981857948f02
def plot_geojson(json_file, layer_title, style, map):
# note: add json to map; however, geojson function only reads json
# https://shallowsky.com/blog/mapping/folium-with-shapefiles.html
folium.GeoJson(
json_file,
name=layer_title,
control=True,
style_function=lambda x:style
).add_to(map)
return(map)
# function to import geojson, then convert to json
# https://github.com/lesley2958/twilio-geospatial
def geojson2json(file_path):
# import geojson and view data source
# https://raw.githubusercontent.com/lesley2958/twilio-geospatial/master/data/states.geojson
sacog_lihm_geojson = gpd.read_file(file_path)
# print(sacog_lihm_geojson.head(5), '\n')
# convert to json
sacog_lihm_json = sacog_lihm_geojson.to_json()
# print(sacog_lihm_json)
return(sacog_lihm_json)
# sacog - lihm areas (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0?selectedAttribute=COUNTYFP10
sacog_lihm_json = geojson2json('data/sacog_lihm_areas_2016.geojson')
# city of sac - existing bike facilities (2018)
# http://data.cityofsacramento.org/datasets/15f8e048d9ad4442a3e12b6182bcd4f2_1?geometry=-121.899%2C38.464%2C-121.028%2C38.652
citysac_bike_fac_json = geojson2json('data/citysac_bike_fac_2018.geojson')
# city of sac - bikeshare opportunity areas (2016)
# http://data.cityofsacramento.org/datasets/8439c4e091a2434aafee1cf888b061f0_0?geometry=-122.330%2C38.373%2C-120.589%2C38.749
citysac_bikeshare_json = geojson2json('data/citysac_bikeshare_areas_2016.geojson')
# sacog - hfta-scs data (2020)
# http://data.sacog.org/datasets/high-frequency-transit-area-mtp-scs-2020
sacog_hfta_json = geojson2json('data/sacog_htfa_2020.geojson')
# sacog - hq-transit, sb375 data (2017)
# http://data.sacog.org/datasets/high-quality-transit-2036?geometry=-123.179%2C38.303%2C-119.697%2C39.053
sacog_sb375_json = geojson2json('data/sacog_sb375_2017.geojson')
# sacog - calenviroscreen3.0, top-25 tracks
# http://data.sacog.org/datasets/calenviroscreen-3-0-top-25-tracts?geometry=-123.212%2C38.343%2C-119.729%2C39.093
sacog_calenv_json = geojson2json('data/sacog_calenv_top25.geojson')
# sacog - air pollution, pm2.5 planning areas (2018)
# http://data.sacog.org/datasets/sacramento-pm-2-5-planning-area-
sacog_pm25_json = geojson2json('data/sacog_pm25_2018.geojson')
# 03.01 - map plot: sfpd tutorial
# note: module based on tutorial below
# https://blog.dominodatalab.com/creating-interactive-crime-maps-with-folium/
# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
# set origin
latlong_sf = (37.76, -122.45)
# create map
map_sfpd = folium.Map(location=latlong_sf, zoom_start=12)
# call function to plot coordinates with cluster markers
map_sfpd = plot_cluster(
df_sfpd.Y,
df_sfpd.X,
'red',
'SFPD: Crime Reports, 2003-2018 (Red)',
map_sfpd
)
# call function to add map controls and title
plot_map(
'Crime Report Map: City of San Francisco (2003-2018)',
'maps/03.01_sfpd_reports.html',
map_sfpd
)
Map Legend
Results
SF Open Data
Tutorials
# 03.02 - map plot: sac area with lihm areas and schools
# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson
# sacog - lihm shapefile (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_school = folium.Map(location=latlong_sac, zoom_start=8)
# plot sacog lihm data
style_sac_lihm = {
'line_opacity': 0.5
}
# call function to plot geojson
map_sac_lihm_school = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Blue)',
style_sac_lihm,
map_sac_lihm_school
)
# call function to plot coordinates with cluster markers
map_sac_lihm_school = plot_cluster(
df_school_sac.Latitude,
df_school_sac.Longitude,
'red',
'CA Schools: Sac County, 2019-20 (Red)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_amador.Latitude,
df_school_amador.Longitude,
'green',
'CA Schools: Amador County, 2019-20 (Green)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_placer.Latitude,
df_school_placer.Longitude,
'blue',
'CA Schools: Placer County, 2019-20 (Blue)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_yolo.Latitude,
df_school_yolo.Longitude,
'orange',
'CA Schools: Yolo County, 2019-20 (Oranage)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_yuba.Latitude,
df_school_yuba.Longitude,
'purple',
'CA Schools: Yuba County, 2019-20 (Purple)',
map_sac_lihm_school
)
# call function to add map controls and title
plot_map(
'Sacramento Area Asset Map: LIHM Communities and Schools',
'maps/03.02_sac_lihm_school.html',
map_sac_lihm_school
)
Map Legend
Results
Note: Analysis only evaluates proximity and not school quality; also, additional CA Geoportal data are listed below for future analysis.
SACOG Data
CA Geoportal Data
# 03.03 - map plot: city of sac, lihm areas and bike facilities
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_bike = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_bike = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_bike
)
style_bike_fac = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_bike = plot_geojson(
citysac_bike_fac_json,
'City of Sac: Bike Facilities, 2018 (Green)',
style_bike_fac,
map_sac_lihm_bike
)
style_bikeshare = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_bike = plot_geojson(
citysac_bikeshare_json,
'City of Sac: Bikeshare Opportunity Areas, 2016 (Purple)',
style_bikeshare,
map_sac_lihm_bike
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Bike Facilities',
'maps/03.03_sac_lihm_bike.html',
map_sac_lihm_bike
)
Map Legend
Results
SACOG Data
City of Sacramento Data
# 03.04 - map plot: city of sac, lihm areas and public transit
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_transit = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_transit = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_transit
)
style_sac_hfta = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_transit = plot_geojson(
sacog_hfta_json,
'SACOG: High Frequency Transit Areas (HFTAs), 2020 (Purple)',
style_sac_hfta,
map_sac_lihm_transit
)
style_sac_sb375 = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_transit = plot_geojson(
sacog_sb375_json,
'SACOG: High Quality Transit (SB375), 2017 (Green)',
style_sac_sb375,
map_sac_lihm_transit
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Transit',
'maps/03.04_sac_lihm_transit.html',
map_sac_lihm_transit
)
Map Legend
Results
SACOG Data
# 03.05 - map plot: city of sac, lihm areas and pollution
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_pollution = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_pollution = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_pollution
)
style_sac_pm25 = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_pollution = plot_geojson(
sacog_pm25_json,
'SACOG: Air Pollution PM 2.5 Planning Areas, 2018 (Purple)',
style_sac_pm25,
map_sac_lihm_pollution
)
style_sac_calenv = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_pollution = plot_geojson(
sacog_calenv_json,
'SACOG: CalEnviroScreen3.0, Top 25% Tracks (Green)',
style_sac_calenv,
map_sac_lihm_pollution
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Pollution Levels',
'maps/03.05_sac_lihm_pollution.html',
map_sac_lihm_pollution
)
Map Legend
Results
SACOG Data
# 03.06 - choropleth plot: lihm community data (2016)
# note: plot based on tutorials listed below
# https://www.nagarajbhat.com/post/folium-visualization/
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sacog_lihm_choro = folium.Map(
location=latlong_sac,
zoom_start=12,
tiles='cartodbpositron'
)
# call function to show table info
# data_profile(sacog_lihm_csv, 'SACOG LIHM Data - 2016')
# create/plot map
map_sacog_lihm_choro = plot_choropleth(
sacog_lihm_json,
sacog_lihm_csv,
['OBJECTID', 'Minority'],
'feature.properties.OBJECTID',
'YlGn',
'SACOG LIHM Communities, 2016 (Green)',
map_sacog_lihm_choro
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities by Poverty Level',
'maps/03.06_plot_sacog_lihm_choropleth.html',
map_sacog_lihm_choro
)